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1. Introduction 



As a commercial vendor of single crystal x-ray dif- 
fractometers, Siemens 1 both produces data which goes 
into crystallographic databases, and searches these data- 
bases during the process of single-crystal structure 
determination. This paper describes how NIST Crystal 
Data is incorporated with the graphical user interfaces 
of the XSCANS software for single crystal diffractome- 
ters, and of the SMART software for two-dimensional 
CCD diffractometers. Reasons for choosing this partic- 
ular database and ideas for future access to this and 
other crystallographic databases are described. 

2. Primary Application of NIST 
Crystal Data Search 

Siemens small-molecule diffractometer users are also 
users of the NIST crystal data identification file. The 
diffractometer users are researchers studying new 



Certain commercial equipment, instruments or materials are identi- 
fied in this paper to foster understanding. Such identification does not 
imply recommendation or endorsement by the National Institute of 
Standards and Technology, nor does it imply that the materials or 
equipment identified are necessarily the best available for the purpose. 



chemicals, minerals and pharmaceuticals. In decreasing 
order of frequency, their fields are chemistry (inorganic, 
organometallic, and organic), crystallography, mineral- 
ogy, and materials science. In North America, their af- 
filiations are 75% academic, 15% industrial, and 10% 
government. The primary application in searching NIST 
Crystal Data is compound identification. Specifically, it 
is to compare the unit cell of a new single crystal spec- 
imen being studied on the diffractometer to previously 
studied unit cells in the database. It is important to note 
that this search can be done before spending the time to 
collect the single-crystal data itself. Thus, the primary 
reason to search NIST Crystal Data is to maximize pro- 
ductivity by not re-collecting data on a known com- 
pound. 

2.1 How Often Known Structures are 
Redetermined 

Even highly experienced researchers may find them- 
selves redetermining a known compound as the number 
of new single crystal structures grows far beyond human 
ability to remember each one. There are 1 97 500 entries 
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with lattice parameters in the 1994 Version of NIST 
crystal data, incorporating entries on all classes of crys- 
talline materials such as inorganics, minerals, metals, 
intermetallics, organics, and organometallics. Cases of 
crystal-structure redetermination of common starting 
materials which were recrystallized, and redetermina- 
tions of compounds previously reported in lesser known 
or foreign-language journals, have been reported to us 
from time to time. To try to attach a quantitative figure 
for how often these redeterminations happen, the results 
of the Pittsburgh Summer School in crystallography 
were surveyed from 1992 through 1995. This school 
provides a 10 day course of intensive lectures and hands- 
on data collection for approximately 30 students each 
summer. Each student is asked to bring a crystal of an 
unknown compound on which to collect data and solve 
the structure. Each year, one or two of these students has 
unexpectedly found a match for the supposedly un- 
known specimen. Complete results are given in Table 1. 
If the NIST Crystal Data search is done prior to data 
collection, the student has time to study a different 
unknown compound. As more and more classes in- 
corporate complete structure determinations as projects, 
the NIST Crystal Data search will become more im- 
portant. 

Table 1. Frequency of known compounds identified by a NIST Crys- 
tal Data search at Pittsburgh Summer School in Crystallography 



Year 



No. structures 
solved 



No. known compounds 
expected unexpected 



1992 


20 


2 


1 


1993 


18 





1 


1994 


27 





1 


1995 


19 


3 


2 



3. How the NIST Crystal Data 
Search is Implemented 

The algorithms to search NIST Crystal Data have 
been written at Siemens, since it was necessary to 
embed them in the graphical user interfaces which 
control the serial four-circle or two-dimensional CCD 
diffractometers. Both types of instruments are con- 
trolled by a PC, with a CD-ROM attached which con- 
tains the NIST Crystal Data Identification File obtained 
through the International Center for Diffraction Data 
[1]. An index file for very fast search is created locally 
on the PC hard disk by a Siemens utility program when 
the file is first installed and when it is updated annually. 
A simple point-and-click menu initiates the NIST Crys- 
tal Data search, utilizing the unit-cell information al- 
ready determined by the diffractometer. When a match 
for the unit cell is found within user-chosen tolerances, 



the CD-ROM is accessed for detailed information on the 
known unit-cell. This information includes the known 
unit cell parameters a, b, c, a, /3, y, and cell volume; 
compound name and formula; literature reference; and 
some descriptive information. The information is dis- 
played and stored for future access. 

The original algorithms incorporate many ideas 
derived from discussions with Alan Mighell, Vicky 
Karen, and colleagues at NIST as far back as 1977 when 
NIST was the National Bureau of Standards [2,3,4]. The 
algorithms were updated after the 1992 American 
Crystallographic Association meeting, where Rodgers 
and LePage [5] discussed searching NIST Crystal Data 
when large uncertainties are present. Cell-comparison 
techniques discussed by Andrews et al. [6] have also 
been considered. Algorithms are designed to ensure that 
no known unit cells are missed in the search. The output 
may sometimes present numerous candidates for a 
match, but this can be screened readily by the researcher 
and is not considered problematic since the search is 
done only once per new crystal studied. 

3.1 Four-Circle Serial Diffractometers 

When the crystal data are to be collected on a serial 
four-circle diffractometer, the average time to proceed 
from mounting the unknown crystal on the diffractome- 
ter to determining the precise unit cell and Bravais lat- 
tice is approximately 1 1/2 hours, as shown in Fig. 1. 
This assumes no human intervention between initial 
screening for the suitability of the crystal and initiating 
the NIST Crystal Data search. This point, prior to data 
collection, is the best time to search the NIST Crystal 
Data to see if the compound on the diffractometer has 
been studied previously, since a further 6 h to 20 h typ- 
ically will be needed to collect the single-crystal data 
and to solve the structure, as shown in Fig. 2. If the unit 
cell is very large or the crystal weakly diffracting, this 
time could extend to many days of data collection. Thus 
the human and instrument time productivity is much 
increased if this time is not repeated. 

3.2 Two-Dimensional CCD Diffractometer 

Siemens has recently introduced a new type of dif- 
fractometer for collecting data on single crystals. This is 
called SMART (Siemens Molecular Analysis Research 
Tool) and incorporates a novel two-dimensional CCD 
detector capable of collecting many reflections simulta- 
neously [7]. With this new instrument, the average time 
to proceed from mounting the unknown crystal on the 
CCD diffractometer to determining the precise unit cell 
and Bravais lattice is only 12 min, as shown in Fig. 3. 
This short time leads us to believe that the SMART 
system, coupled with a NIST Crystal Data search, could 
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be used to identify unknown single crystals in much the 
same way powder samples are identified using a powder 
diffractometer today. The initial reason for searching the 
NIST Crystal Data, to avoid re-collecting data on a 
known compound, is still important. From Fig. 4, we see 
that a further 1 h to 8 h to collect the data and solve the 
structure is required. Many researchers choose a CCD 
diffractometer system for crystals which are very weakly 
diffracting or very small, for example 10 |mm on a side. 
For such crystals data collection may take up to a day. 

3.3 Examples of NIST Crystal Data Search 

The search is initiated using precise unit-cell parame- 
ters, determined on the diffractometer immediately 
prior to the search, or typed into the search menu from 
previously determined data. Parameters used by the 



Mount goniometer head 
on instrument 



Optically align sample 



2 minutes 



Verify Laue Symmetry 



i 



Collect data serially 



Collect psi 
scan data 



Index 
crystal faces 



4 to 16 hours 



Reduce data, 
determine space group 



i 



Solve structure 



Refine structure 



1 minute 



2 minutes 



10 minutes 
to 2 hours 



Take Polaroid rotation 
photo 



10 minutes 



I 




Search for and center 
reflections 






I 






Autoindex reflections 




60 minutes 


I 






Calculate least-squares 

unit cell and reduce 

primitive cell 







Take axial 
photographs 



Search for 
fractional cells 



Add high angle 
centered reflections 



Determine precise unit 
cell and 8ravais lattice 



30 
minutes 



Search NIST 
Crystal Data File 



0.5 minutes 



Time to identify if known 
compound=100 minutes 



Total time for single crystal 

structure=6 hours to 20 hours 

typically (1 week maximum) 



Fig. 1. Four-circle diffractometer search time. 



Fig. 2. Time to collect data on a four-circle diffractometer. 

sarch are: unit-cell axial lengths a,b,c\ axial angles a, 
j6, y; tolerance for the match as a fraction; lattice center- 
ing type (such as P for primitive); whether to search the 
organic or inorganic file or both; an output file name to 
store results; and whether to display short or verbose 
output. 

Typically no hits will be shown, meaning no match is 
found for the new unit cell. Sometimes a match indicates 
an isomorphous compound with very similar unit cell 
but different chemical elements. One example was a 
match between the newly collected C^J^B^Z^^See 
with a primitive cubic unit cell 17.79, 17.79, 
17.79, 90.00, 90.00, 90.00 and the data base compound 
C44H 5 4Br4Cd4N 2 S6 with unit cell parameters 17.869, 
17.869, 17.869, 90.00, 90.00, 90.00, published earlier 
by Dean et al. at the University of Western Ontario [8]. 
Another example was a badly split crystal of a vanadium 
compound. In spite of the poor crystal quality, sufficient 
reflections were found to determine the unit cell and 
search NIST Crystal Data to match vanadyl hydrogen 
phosphate [9,10]. 
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Mount goniometer head 
on instrument 



Optically align sample 



Collect still or rotation 
electronic photograph 



I 



Collect subset of frames 
for indexing 



I 



Search NIST Crystal 
Data File 



Time to identify if 
known structure=12 minutes 



2 minutes 



1 minute 



7.5 minutes 



Autoindex reflections 




1 




Calculate least-squares 

unit cell and reduced 

primitive cell 




T 




Determine Bravais lattice 





0.5 minutes 



0.5 minutes 



Optionally index 
crystal faces 



Collect and integrate 
frame data 



Refine final cell 
constants 



Reduce data r correct 

for absorption, 

determine space group 



Solve structure 



Refine structure 



Total time for single 

crystal structured hour 

to 8 hours typical 

(1 day maximum) 



5 minutes 



30 minutes to 
6 hours typical 



1 minute 



1 minute 



2 minutes 



10 minutes 
to 2 hours 



Fig. 3. CCD diffractometer search time. 



Fig. 4. Time to collect data on a CCD diffractometer. 



Students at the Pittsburgh Summer School in crystal- 
lography are encouraged to search NIST Crystal Data 
prior to collecting data. An example for adenine hy- 
drochloride hemihydrate, brought by N. Sparks [11] as 
a known compound, is shown in Fig. 5. Three hits in 
the organic file showed that the structure had been 
published previously three times, first in 1948. 

4. Data Base Accessibility to 
Vendors and Users 

The NIST Crystal Data is the only crystallographic 
database for which the search algorithms have been 



integrated with Siemens' diffractometer control soft- 
ware. The volume of new small-molecule crystal struc- 
tures created a real need to maximize productivity of 
both the researcher and the instrument. NIST Crystal 
Data database contains the unit-cell information needed 
prior to data collection. However, it was not until NIST 
made the database available on a simple medium, a 
CD-ROM in PC format, that it became feasible to access 
it from vendor software. It was also important that reg- 
ular updates (annually) were easy to obtain through 
ICDD [1], and that the reasonable license fee to indus- 
trial as well as academic users made it commercially 
feasible. 
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Edit 



EBHE1 



Acquire 



Analyze 



Gem i 



Calib 



Leve 1 



p^^TsS^^^^T^^t "" * A "* " ***"" 



8 hit(s) with inorganic file 
3 bit(s) with organic file 



tt = 588715 GOF 
4.824 8.781 



hit l; crystal data 
Reduced Unknown Ce 1 1 : 
C18H14N180C12 
Reduced Crys Data cell: 
C5 H7 CI N5 08. 58 
[Aden ine hydroch lor ide hem ihydrate 

Acta Crystal logr. 

1 324 1948 Broomliead, J.M. 



1.6949 Record 8 = 
17.714 93.44 98.88 



1568598 
98.88 



8.771 17.768 



98.88 98.88 



8 = 588716 GOF 
4.824 8.781 



1.6949 Record « = 
17.714 93.44 98.88 



1568611 
98.88 



17.768 



98.88 98.88 




742. 17 



752.28 



742. 17 
752.28 



188 
t) 
t) 
t) 
t) 
1) 
6) 
3) 
8) 
9) 
8) 
2) 



4.928 

J413L285 

J413L86 

512 

4 



Line 23: Press L button or ENTER for options. 



Help 



Fig. 5. CDF search results on adenine hydrochloride hemihydrate. 



4.1 Enhanced Accessibility of Crystallographic 
Databases 

For crystallographic databases in general, some 
short-term recommendations can be made for today's 
computing environment. At present Siemens small- 
molecule users prefer access on PC computers, followed 
by Silicon Graphics work stations. Siemens protein- 
crystallography users prefer Silicon Graphics work- 
stations. CD-ROM as a media for local database storage 
is universal, simple, and inexpensive. While access over 
the Internet is attractive and growing rapidly, not every 
laboratory is attached. The cost of license fees, espe- 
cially if access to several databases is needed, can be a 
barrier, particularly for industrial users. Database ven- 
dors might project maximum revenue using different 
scenarios ranging from low market price for a large 
number of users, to high market price for a smaller 
number of users. Market awareness of some of the data- 
bases should be enhanced, as the biggest barrier to using 
them is potential users lack of knowledge of what they 
can be used for, or indeed of their very existence. 



Longer term recommendations can be more sweep- 
ing. Even now, the majority of single-crystal structures 
are not accessible in the databases because they have not 
yet been published. The barriers to publication include 
lack of time to prepare the publication; lack of desire to 
publish because the structure is not "good enough" but 
the chemistry is adequately determined; or the com- 
pound is proprietary. To address this, worldwide elec- 
tronic deposition of unpublished data, with validation 
safeguards, must be strongly encouraged. Easy ways to 
create and search local databases of proprietary com- 
pounds could be provided. With the advent of two-di- 
mensional detectors, the number of new single-crystal 
structures will grow exponentially. In the field of protein 
crystallography, this has already happened with the in- 
troduction of area detectors over a decade ago. In small 
molecule crystallography, we are just beginning the tran- 
sition. From Fig. 2 we can see that if a serial diffrac- 
tometer takes from a few hours to a few days to collect 
data, then a sustained productivity of up to 150 new 
structures a year is possible for one such instrument. 
With the new CCD diffractometer, from Fig. 4 we can 
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see that this number could increase to as many as two to 
three structures per day if suitable crystals are available. 
Database providers will need to discover innovative 
ways to find and validate this data, including that in 
foreign-language and lesser known publications, and to 
update the databases ever more rapidly. 
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